[Scalability] Enable Horizontal Pod Autoscaler (HPA) for Alcor deployment #679

kevin-zhonghao · 2021-08-15T19:15:55Z

Configure cluster and network configuration to meet the requirements on using Metrics Server:
- Metrics Server must be reachable from kube-apiserver by container IP address (or node IP if hostNetwork is enabled).
- The kube-apiserver must enable an aggregation layer.
- Nodes must have Webhook authentication and authorization enabled.
- Kubelet certificate needs to be signed by cluster Certificate Authority (or disable certificate validation by passing --kubelet-insecure-tls to Metrics Server)
- Container runtime must implement a container metrics RPCs (or have cAdvisor support)
Deploy Metrics-Server in cluster.
Deploy Horizontal Pod Autoscaler for each deployment with customized hpa yaml.

…o new_master

update

sync up

Update

update

sync up

xieus · 2021-08-16T20:53:51Z

kubernetes/services/metrics-server.yaml

+metadata:
+  labels:
+    k8s-app: metrics-server
+  name: metrics-server
+  namespace: kube-system
+---


@kevin-zhonghao Kevin, where are those metrics stored? Is it possible to visualize those metrics somethere?

@xieus Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler. It is not used to store any data, it is more likely an API to get current resource usage status

@kevin-zhonghao @xieus Is that similar to Elastic's Meatricbeat? https://www.elastic.co/blog/kubernetes-observability-tutorial-k8s-metrics-collection-and-analysis

@cj-chung A little similar. In general, we don't use metrics-server as monitoring solution or as a source of monitoring solution metrics. Currently it is just used by HPA.

xieus

@kevin-zhonghao Thanks. A few comments.

xieus · 2021-08-27T15:15:31Z

kubernetes/services/vpc_manager_hpa.yaml

+      policies:
+        - type: Percent
+          value: 100
+          periodSeconds: 15


Does periodSeconds mean the time interval to check the percentage number?

Does periodSeconds mean the time interval to check the percentage number?

Not really, periodSeconds: 15 above means it can reduce pods by up to 100% in 15 seconds.

xieus · 2021-08-27T15:17:27Z

kubernetes/services/vpc_manager_hpa.yaml

+      # The autoscaler will choose the strategy that affects the minimum number of Pods
+      selectPolicy: Min
+    scaleUp:
+      stabilizationWindowSeconds: 0


The value of stabilizationWindowSeconds differs in the ScaleUp and ScaleDown policies. Is it the best practice to set stabilizationWindowSeconds = 0? Does 0 meaning that the autoscaler will always respond to changes immediately?

The stabilization window is used to restrict the flapping of replicas when the metrics used for scaling keep fluctuating.
When the metrics indicate that the target should be scaled down the algorithm looks into previously computed desired states and uses the highest value from the specified interval.

For example,
here we set up
'scaleup:
stabilizationWindowSeconds: 0'

It should scale up the pods immediately if need.

and we set up
'scaledown:
stabilizationWindowSeconds: 300'

When current metrics indicate that we could scale down the pods, HPA will consider the state of past within 300 seconds to determine if we can scale down now.

Sync up

…into k8s/hpa

[Data Plane Mgr] Fix issues from test scenario 4.5 (futurewei-cloud#688)

…into k8s/hpa

xieus · 2021-10-03T19:26:37Z

Some preliminary test in the Medina cluster appears to be quite promising.

The current 0.19 release already contains a lot of new features therefore we decided to move this feature to 12/30 release.

xieus · 2022-02-01T19:01:29Z

@yanmo96 We need to test this PR and get it merged by 2/28.

kevin-zhonghao and others added 9 commits October 7, 2020 14:49

commit message

24b9b76

Merge branch 'master' of https://github.com/futurewei-cloud/alcor int…

08ec421

…o new_master

Merge pull request #10 from futurewei-cloud/master

598f3aa

update

Merge pull request #11 from futurewei-cloud/master

3f44da0

sync up

Merge pull request #12 from futurewei-cloud/master

72f9591

Update

Merge pull request #13 from futurewei-cloud/master

c26c1b2

update

Merge pull request #14 from futurewei-cloud/master

ed7a10e

update

Merge pull request #15 from futurewei-cloud/master

f4097e1

sync up

update

1485704

xieus requested review from cj-chung and xieus August 16, 2021 18:46

xieus assigned kevin-zhonghao Aug 16, 2021

xieus added deployment and upgrade k8s labels Aug 16, 2021

xieus added this to the Version 0.18.2021.08.30 milestone Aug 16, 2021

xieus reviewed Aug 16, 2021

View reviewed changes

xieus linked an issue Aug 17, 2021 that may be closed by this pull request

[Scalability] Enable Horizontal Pod Autoscaler (HPA) for Alcor deployment #635

Open

update

29ea149

xieus reviewed Aug 27, 2021

View reviewed changes

kevin-zhonghao and others added 5 commits September 11, 2021 11:28

Merge pull request #16 from futurewei-cloud/master

7e92e06

Sync up

Merge branch 'new_master' of https://github.com/kevin-zhonghao/alcor …

2c5e59c

…into k8s/hpa

update

dabc518

Merge pull request #17 from futurewei-cloud/master

ff6d28c

[Data Plane Mgr] Fix issues from test scenario 4.5 (futurewei-cloud#688)

Merge branch 'new_master' of https://github.com/kevin-zhonghao/alcor …

7420d5e

…into k8s/hpa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Scalability] Enable Horizontal Pod Autoscaler (HPA) for Alcor deployment #679

[Scalability] Enable Horizontal Pod Autoscaler (HPA) for Alcor deployment #679

kevin-zhonghao commented Aug 15, 2021 •

edited by xieus

Loading

xieus Aug 16, 2021

kevin-zhonghao Aug 16, 2021

cj-chung Aug 16, 2021

kevin-zhonghao Aug 16, 2021

xieus left a comment

xieus Aug 27, 2021

kevin-zhonghao Aug 27, 2021

xieus Aug 27, 2021

kevin-zhonghao Aug 27, 2021

xieus commented Oct 3, 2021

xieus commented Feb 1, 2022

[Scalability] Enable Horizontal Pod Autoscaler (HPA) for Alcor deployment #679

Are you sure you want to change the base?

[Scalability] Enable Horizontal Pod Autoscaler (HPA) for Alcor deployment #679

Conversation

kevin-zhonghao commented Aug 15, 2021 • edited by xieus Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xieus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xieus commented Oct 3, 2021

xieus commented Feb 1, 2022

kevin-zhonghao commented Aug 15, 2021 •

edited by xieus

Loading